NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Nested Heteroscedastic Gaussian Process for Simulation Metamodeling

Zhao, J; Chen, X (December 2024, IEEE)

Full Text Available
S2FT: Efficient, Scalable and Generalizable LLM Fine-tuning by Structured Sparsity

Yang, X; Leng, J; Guo, G; Zhao, J; Nakada, R; Zhang, L; Yao, H; Chen, B (December 2024, NeurIPS)

Current PEFT methods for LLMs can achieve either high quality, efficient training, or scalable serving, but not all three simultaneously. To address this limitation, we investigate sparse fine-tuning and observe a remarkable improvement in generalization ability. Utilizing this key insight, we propose a family of \underline{S}tructured \underline{S}parse \underline{F}ine-\underline{T}uning (\textbf{\model}) methods for LLMs, which \textit{concurrently achieve state-of-the-art fine-tuning performance, training efficiency, and inference scalability}. \model \mbox{accomplishes this by ``selecting sparsely and computing densely". It selects a few} heads and channels in the MHA and FFN modules for each Transformer block, respectively. Next, it co-permutes weight matrices on both sides of the coupled structures in LLMs to connect the selected components in each layer into a dense submatrix. Finally, \model performs in-place gradient updates on all submatrices. Through theoretical analysis and empirical results, our method prevents overfitting and forgetting, delivers SOTA performance on both commonsense and arithmetic reasoning with 4.6$$\%$$ and 1.3$$\%$$ average improvements compared to LoRA, and surpasses full FT by 11.5$$\%$$ when generalizing to various domains after instruction tuning. Using our partial backpropagation algorithm, \model saves training memory up to 3$$\times$$ and improves latency by 1.5-2.7$$\times$$ compared to full FT, while delivering an average 10\% improvement over LoRA on both metrics. We further demonstrate that the weight updates in \model can be decoupled into adapters, enabling effective fusion, fast switch, and efficient parallelism for serving multiple fine-tuned models.
more » « less
Full Text Available
Graphite: A Graph-Based Extreme Multi-Label Short Text Classifier for Keyphrase Recommendation

https://doi.org/10.3233/FAIA241061

Mishra, A; Dey, S; Zhao, J; Wu, M; Li, B; Madduri, K (October 2024, IOS Press)

Full Text Available
Mapping the Ge/InAl(Ga)As interfacial electronic structure and strain relief mechanism in germanium quantum dots

https://doi.org/10.1039/D4TC01587H

Hudait, Mantu K; Bhattacharya, S; Karthikeyan, S; Zhao, J; Bodnar, R J; Magill, B A; Khodaparast, G A (September 2024, Journal of Materials Chemistry C)

Germanium quantum dots (QDs) with defect-free regions and clusters of stacking faults (SFs) relieved the strain from Ge QDs.
more » « less
Full Text Available
Evidence for the 15Be ground state from 12Be + 3n events

https://doi.org/10.1103/PhysRevC.110.064302

Kuchera, A N; Shahid, R; Zhao, J; Edmondson, A; DeYoung, P A; Frank, N; McDonaugh, J; Peterson-Veatch, O; Rogers, W F; Redpath, T; et al (December 2024, Physical Review C)

Full Text Available
Does Differential Privacy Impact Bias in Pretrained Language Models?

Islam, MK; Wang, A; Wang, T; Ji, Y; Fox, J; Zhao, J (June 2024, IEEE Data Engineering Bulletin (Special Issue on Privacy-preserving Data Management) Vol. 48 No. 2, June 2024.)
Wang, H; Xiao, X (Ed.)
Differential privacy (DP) is applied when fine-tuning pre-trained language models (LMs) to limit leakage of training examples. While most DP research has focused on improving a model’s privacy-utility tradeoff, some find that DP can be unfair to or biased against underrepresented groups. In this work, we extensively analyze the impact of DP on bias in LMs. We find differentially private training can increase the model bias against protected groups w.r.t AUC-based bias metrics. DP makes it more difficult for the model to differentiate between the positive and negative examples from the protected groups and other groups in the rest of the population. Our results also show that the impact of DP on bias is affected by both the privacy protection level and the underlying distribution of the dataset.
more » « less
Full Text Available
Pressure induced Invar effect in ${Fe}_{55} {Ni}_{45}$ : An experimental study with nuclear resonant scattering

https://doi.org/10.1103/PhysRevB.110.144419

Guzman, P; Lohaus, S H; Bernal-Choban, C M; Fultz, B; Zhao, J Y; Shen, G; Hu, M Y; Alp, E E; Lavina, B (October 2024, Physical Review B)

Pressure-dependent synchrotron X-ray diffraction (XRD), nuclear resonant inelastic X-ray scattering (NRIXS), and nuclear forward scattering (NFS) measurements were made on 57Fe55Ni45. XRD measurements were at 298 K and 392 K at pressures up to 20 GPa, confirming a pressure-induced Invar effect between 7 GPa and 13 GPa. A decrease of the 57Fe magnetic moment was found in NFS measurements under pressure, showing an increase in magnetic entropy. The 57Fe phonon density of states (DOS) was obtained from NRIXS measurements. The low thermal expansion in the high-pressure Invar region originates from a competition between the thermal expansion from spins and phonons as calculated from Maxwell relations. The longitudinal phonon modes changed their pressure dependence near the Curie transition, which is evidence for a spin-phonon interaction.
more » « less
Full Text Available
Valence instability and collapse of ferromagnetism in EuB6 at high pressures

https://doi.org/10.1016/j.jmmm.2024.172203

Kutelak, LO; Sereika, R; Fabbris, G; Francisco, L; Lombardi, G; Poldi, EHT; Zhao, J; Alp, EE; Souza_Neto, NM; Rosa, PFS; et al (June 2024, Journal of Magnetism and Magnetic Materials)

Full Text Available
Tree of Thoughts: Deliberate Problem Solving with Large Language Models

Yao, S; Yu, D; Zhao, J; Shafran, I; Griffiths, T; Cao, Y; Narasimhan, K (December 2023, Directory of chemical producers Canada)

Language models are increasingly being deployed for general problem solving across a wide range of tasks, but are still confined to token-level, left-to-right decision-making processes during inference. This means they can fall short in tasks that require exploration, strategic lookahead, or where initial decisions play a pivotal role. To surmount these challenges, we introduce a new framework for language model inference, “Tree of Thoughts” (ToT), which generalizes over the popular “Chain of Thought” approach to prompting language models, and enables exploration over coherent units of text (“thoughts”) that serve as intermediate steps toward problem solving. ToT allows LMs to perform deliberate decision making by considering multiple different reasoning paths and self-evaluating choices to decide the next course of action, as well as looking ahead or backtracking when necessary to make global choices. Our experiments show that ToT significantly enhances language models’ problem-solving abilities on three novel tasks requiring non-trivial planning or search: Game of 24, Creative Writing, and Mini Crosswords. For instance, in Game of 24, while GPT-4 with chain-of-thought prompting only solved 4% of tasks, our method achieved a success rate of 74%. Code repo with all prompts: https://github.com/princeton-nlp/tree-of-thought-llm.
more » « less
Full Text Available
Cognitive diversity in context: US-China differences in children’s reasoning, visual attention, and social cognition

Carstensen, A; Cao, A; Tan, A; Liu, D; Liu, Y; Bui, M; Wang-Zhao, J; Han, Q; Walker, C; Frank, M (January 2024, Proceedings of the Annual Conference of the Cognitive Science Society)

Full Text Available

« Prev Next »

Search for: All records